Yet Another Locking Article
Stephen Beaulieu <hippo@be.com>

It is funny, but somewhat fitting that many times the Newsletter article you intend to write is not really the Newsletter article you end up writing.  With the best of intentions, I chose to follow a recent trend in articles and talk about multi-threaded programming and locking down critical sections of code and resources.  The vehicle for my discussion was to be a Multiple-Reader Single-Writer locking class in the mode of BLocker: complete with Lock(), Unlock(), IsLocked() and an Autolocker style utility class.  Needless to say, the class I was expecting is a far cry from what I will present today.

In the hopes of this being my first short Newsletter article, I will leave the details of the class to the sample code.  For once it was carefully prepared ahead of time and is reasonably commented.  I will briefly point out two neat features of the class before heading into a short discussion of locking styles.  The first function to look at is the IsWriteLocked() function, as it shows a way to cache the index of a thread's stack in memory, and use to help identify a thread in a way faster than our old standby find_thread(NULL).  The stack_base method is not infallible, and needs to be backed up by find_thread(NULL) when there is no match, but it is considerably faster when a match is found.  This is kind of like the benaphore technique of speeding up semaphores.  The other functions to look at are the register_thread() and unregister_thread() functions.  These are debug functions that keep state about threads holding a read-lock by creating a state array with room for every possible thread.  An individual slot can be set aside for each thread and specified by performing an operation: thread_id%max_possible_threads.  Again, the code itself lists these in good detail.  I hope you find the class useful.  A few of the design decisions I made are detailed in the discussion below.

I want to take a little space to discuss locking philosophies and their trade-offs.  The two opposing views can be presented briefly as 'Lock Early, Lock Often' and 'Lock Only When and Where Necessary'.  These philosophies sit on opposite ends of the spectrum of ease of use and efficiency, and both have their adherents in the company (understanding that most engineers here fall comfortably in the middle ground.)

The 'Lock Early, Lock Often' view rests on the idea that if you are uncertain exactly where you need to lock, it is better to be extra sure that you lock your resources.  It advises that all locking classes should support 'nested' calls to Lock(); in other words if a thread holds a lock and calls Lock() again, it should be allowed to continue without deadlocking waiting for itself to release the lock.  This increases the safety of the lock, by allowing you to wrap all of your functions in Lock() Unlock() pairs and allowing the lock to take care of knowing if the lock needs to be acquired or not.  An extension of this are Autolocking classes, which acquire a lock in their constructor and release it in their destructor.  By allocating one of these on the stack you can be certain that you will safely hold the lock for the duration of your function.

The main advantage of the 'Lock Early, Lock Often' strategy is its simplicity.  It is really very easy to add locking to your applications: create an Autolock at the top of all your functions and be assured that it will do it's magic.  The downside of this philosophy is that the lock itself needs to get smarter and to hold onto state information, which can cause some inefficiencies in space and speed.

At the other end of the spectrum is the 'Lock Only When and Where Necessary'. This philosophy touts that programmers using the 'Lock Early, Lock Often' strategy do not understand the locking requirements of their applications, and that is essentially a bug just waiting to happen.  In addition, the overhead added to applications by locking when it is unnecessary (say in a function that is only called from within another function that already holds the lock) and by using an additional class to manage the lock makes the application larger and less efficient.  This view instead tells people to really design their applications and to fully understand the implications of the locking mechanisms chosen. 

So, which is correct?  I think it often depends on the tradeoffs you are willing to make.  With locks with only a single owner, the state information needed is very small, and usually the lock's system for determining if a thread holds the lock is fairly efficient (see the stack_base trick mentioned above to make it a bit faster.)  Another consideration is how important speed and size are when dealing with the lock.  In a very crucial area of an important, busy program, like the app_server, increasing efficiency can be paramount.  In that case it is much, much better to take the extra time to really understand the locking necessary and to reduce the overhead.  Even better would be to design a global application architecture that makes the flow of information clear, and correspondingly makes the locking mechanisms much better (along with everything else.)

The MultiLocker sample code provided leans far to the efficiency side.  The class itself allows multiple readers to acquire the lock, but does not allow these readers to make nested ReadLock() calls.  The overhead for keeping state for each readers (storage space and stomping through that storage space every time a ReadLock() or ReadUnlock() call was made) was simply too great.  Writers, on the other hand, have complete control over the lock, and may make ReadLock() or additional WriteLock() calls after the lock has been acquired.  This allows a little bit of design flexibility so that functions that read information protected by the lock can be safely called by a writer without code duplication.

The class does have a debug mode where state information is kept about readers so you can be sure that you are not performing nested ReadLock()s.  The class also has timing functions so that you can see how long each call takes in both DEBUG mode and with slight modifications to the class, the benefits of the stack_base caching noted above.  I have included some extensive timing information from my computers that you can look at, or you can run your own tests with the test app included.  Note that the numbers listed are pretty close to the raw numbers of the locking overhead, as writers only increment a counter, and readers simply access that counter.

The sample code can be found at:

<ftp://ftp.be.com/pub/samples/r3/support_kit/MultiLocker.zip>

The class should be pretty efficient, and you are free to use it and make adjustments as necessary.  My thanks go out to Pierre and George from the app_server team, for the original lock on which this is based, and for their assistance with (and insistence on) the efficiency concerns.

